154 research outputs found

    GRAPE for fast and scalable graph processing and random-walk-based embedding

    Get PDF
    Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding.National Center for Gene Therapy and Drugs based on RNA Technology, PNRR-NextGenerationEU program G43C22001320007United States Department of Health & Human Services National Institutes of Health (NIH) - USA NIH National Cancer Institute (NCI) U01-CA239108-02Transition Grant Line 1A Project NIMI PARTENARIATI H2020' 1R24OD011883-01United States Department of Health & Human Services National Institutes of Health (NIH) - USA U01-CA239108-02 DE-AC02-05CH11231United States Department of Energy (DOE)European Union (EU) Marie Curie Actions PSR2015-1720GVALE_01 PID2021-128970OA-I0

    Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding

    Full text link
    We introduce a set of algorithms (Het-node2vec) that extend the original node2vec node-neighborhood sampling method to heterogeneous multigraphs, i.e. networks characterized by multiple types of nodes and edges. The resulting random walk samples capture both the structural characteristics of the graph and the semantics of the different types of nodes and edges. The proposed algorithms can focus their attention on specific node or edge types, allowing accurate representations also for underrepresented types of nodes/edges that are of interest for the prediction problem under investigation. These rich and well-focused representations can boost unsupervised and supervised learning on heterogeneous graphs.Comment: 20 pages, 5 figure

    parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.

    Get PDF
    BACKGROUND: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. RESULTS: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. CONCLUSIONS: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF

    GraPE: fast and scalable Graph Processing and Embedding

    Full text link
    Graph Representation Learning methods have enabled a wide range of learning problems to be addressed for data that can be represented in graph form. Nevertheless, several real world problems in economy, biology, medicine and other fields raised relevant scaling problems with existing methods and their software implementation, due to the size of real world graphs characterized by millions of nodes and billions of edges. We present GraPE, a software resource for graph processing and random walk based embedding, that can scale with large and high-degree graphs and significantly speed up-computation. GraPE comprises specialized data structures, algorithms, and a fast parallel implementation that displays everal orders of magnitude improvement in empirical space and time complexity compared to state of the art software resources, with a corresponding boost in the performance of machine learning methods for edge and node label prediction and for the unsupervised analysis of graphs.GraPE is designed to run on laptop and desktop computers, as well as on high performance computing cluster

    GRAPE for fast and scalable graph processing and random-walk-based embedding

    Get PDF
    Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third- party libraries, while ready-to-use and modular pipelines permit an easy-to- use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding

    Metronomic Oral Vinorelbine: An Alternative Schedule in Elderly and Patients PS2 With Local/Advanced and Metastatic NSCLC Not Oncogene-addicted

    Get PDF
    The MILES and ELVIS studies showed that vinorelbine is one of the best options for elderly patients with advanced non-small-cell-lung cancer (NSCLC). Oral vinorelbine at standard schedule (60-80 mg/m2/weekly) has good activity in terms of response rates and progression-free survival. In recent years, a metronomic schedule of oral vinorelbine (40-50 mg/m2 three times a week, continuously) has been studied in phase II trials, especially in unfit and elderly patients. In the MOVE trial metronomic oral vinorelbine had a clinical benefit [partial response (PR)+stable disease (SD) >12 weeks] in 58.1% of patients with mild toxicity. On this basis, in 2017 we started a phase II study with metronomic oral vinorelbine in elderly (over 70 years) or unfit [Eastern Cooperative Oncology Group performance score (ECOG-PS) of 2] patients with locally/advanced and metastatic NSCLC. Primary aims were clinical benefit (PR+SD ≥6 months) and toxicity; secondary aims were progression-free survival and overall survival

    Volatile lipophilic substances management in case of fatal sniffing.

    Get PDF
    Death due to inhalation of aliphatic hydrocarbons such as butane and propane is a particularly serious problem worldwide, resulting in several fatal cases of sniffing these volatile substances in order to "get high". Despite the number of cases published, there is not a unique approach to case management of fatal sniffing. In this paper we illustrate the volatile lipophilic substances management in a case of a prisoner died after sniffing a butane-propane gas mixture from prefilled camping stove gas canisters, discussing the comprehensive approach of the crime scene, the autopsy, histology and toxicology. A large set of accurate values of both butane and propane was obtained by gas chromatography-mass spectrometry analyzing the following post-mortem biological samples: peripheral blood, heart blood, vitreous humor, liver, lung, heart, brain/cerebral cortex, fat tissue, kidney, and allowed an in depth discussion about the cause of death. A key role is played by following the proper sampling approach during autopsy

    Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.

    Get PDF
    Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of \u3e530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy
    corecore